Information processing apparatus, control method, and program

ABSTRACT

An information processing apparatus ( 2000 ) analyzes a captured image ( 12 ) generated by a camera ( 10 ) to determine a motion of a person. The camera ( 10 ) is a camera that images a display place where an item is displayed. The information processing apparatus ( 2000 ) detects a reference position ( 24 ) from the captured image ( 12 ). The reference position ( 24 ) indicates a position of a hand of the person. The information processing apparatus ( 2000 ) decides an analysis target region ( 30 ) to be analyzed in the captured image ( 12 ) using the reference position ( 24 ). The information processing apparatus ( 2000 ) analyzes the analysis target region ( 30 ) to determine the motion of the person.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 17/349,045, filed on Jun. 16, 2021, which is acontinuation application of U.S. patent application Ser. No. 16/623,656,filed on Dec. 17, 2019, which is a national stage application ofInternational Application No. PCT/JP2017/022875, filed on Jun. 21, 2017,the disclosure of which is hereby incorporated by reference in itsentirety.

TECHNICAL FIELD Technical Field

The present invention relates to image analysis.

Background Art

In a store, a customer takes out and purchases a product displayed in adisplay place (for example, a product shelf). The customer may returnthe product once picked up to the display place. Techniques of analyzingsuch an action of the customer related to the product displayed aredeveloped.

For example, Patent Document 1 discloses a technique of detecting thatan item (a hand of person) enters a determined region (shelf) using adepth image obtained from an imaging result by a depth camera anddetermining a motion of a customer using a color image near an entryposition before and after the entry. Specifically, a color imageincluding a hand of a person entering the determined region is comparedwith a color image including the hand of the person leaving thedetermined region to respectively determine the motions of the person as“acquisition of product” in a case where an increase in a color exceedsa threshold value, “return of product” in a case where a decrease in thecolor exceeds a threshold value, and “contact” in a case where a changein the color is less than a threshold value. Further, Patent Document 1discloses a technique of deciding the increase or decrease in a volumeof a subject from information on a size of the subject obtained from theimaging result of the depth camera to distinguish between theacquisition and the return of the product.

RELATED DOCUMENT Patent Document

-   [Patent Document 1] US Patent Application No. 2014/0132728

SUMMARY OF THE INVENTION Technical Problem

A degree of increase or decrease in color or volume before and after theentry of the hand of the person into the display place is affected by,for example, changes in a size of the product or a pose of the hand ofthe person. For example, in a case where a small product is taken outfrom the display place, the increase in color and volume before andafter that is small. Further, a motion of changing the pose of the handmay be erroneously recognized as the motion of acquiring the product.

The present invention is made in view of the above problems. One of theobjects of the present invention is to provide a technique ofdetermining a motion of a person with respect to a displayed item withhigh accuracy.

Solution to Problem

The information processing apparatus according to the present inventionincludes: 1) a detection unit that detects a reference positionindicating a position of a hand of a person included in a captured imagefrom the captured image in which a display place of an item is imaged;2) a deciding unit that decides an analysis target region in thecaptured image using the detected reference position and decides theanalysis target region; and 3) a determination unit that analyzes thedecided analysis target region to determine a motion of the person.

A control method of the present invention is executed by a computer. Thecontrol method includes: 1) a detection step of detecting a referenceposition indicating a position of a hand of a person included in acaptured image from the captured image in which a display place of anitem is imaged; 2) a deciding step of deciding an analysis target regionin the captured image using the detected reference position and decidingthe analysis target region; and 3) a determination step of analyzing thedecided analysis target region to determine a motion of the person.

A program of the present invention causes a computer to execute eachstep of the control method of the present invention.

Advantageous Effects of Invention

According to this invention, there is provided the technique ofdetermining the motion of the person with respect to the displayed itemwith high accuracy.

BRIEF DESCRIPTION OF THE DRAWINGS

The objects described above and other objects, features, and advantageswill become more apparent from preferred example embodiments describedbelow and the following drawings accompanying the example embodiments.

FIG. 1 is a diagram conceptually illustrating an operation of aninformation processing apparatus according to an example embodiment 1.

FIG. 2 is a block diagram illustrating an example of a functionalconfiguration of the information processing apparatus according to theexample embodiment 1.

FIG. 3 is a diagram illustrating a computer for forming the informationprocessing apparatus.

FIG. 4 is a flowchart illustrating a flow of processing executed by theinformation processing apparatus according to the example embodiment 1.

FIG. 5 is a first diagram illustrating an imaging range of a camera.

FIG. 6 is a second diagram illustrating the imaging range of the camera.

FIG. 7 is a diagram illustrating a case where a captured image includesa scene in which a product shelf is imaged from the right side as viewedfrom the front.

FIGS. 8A and 8B are diagrams illustrating an analysis target region thatis decided as a region having a predetermined shape defined with areference position as a reference.

FIG. 9 is a diagram illustrating a case where an orientation of theanalysis target region is defined based on an orientation of a hand of acustomer.

FIG. 10 is a flowchart illustrating a flow of processing for determininga motion of a customer 20.

FIG. 11 is a flowchart illustrating the flow of the processing fordetermining the motion of the customer 20.

FIG. 12 is a diagram illustrating display information in a table format.

FIG. 13 is a diagram illustrating a depth image generated by the camera.

FIG. 14 is a diagram illustrating display information indicating a rangeof a distance from a camera for each stage of the product shelf.

DESCRIPTION OF EMBODIMENTS

Hereinafter, example embodiments of the present invention will bedescribed with reference to drawings. Note that, in all the drawings,the same reference numeral is assigned to the same component and thedescription thereof will not be repeated. Further, each block representsa configuration of functional units instead of a configuration ofhardware units in each block diagram, unless otherwise described.

Example Embodiment 1 Outline of Operation of Information ProcessingApparatus 2000

FIG. 1 is a diagram conceptually illustrating an operation of aninformation processing apparatus according to an example embodiment 1(information processing apparatus 2000 shown in FIG. 2 and the likedescribed below). Note that FIG. 1 is an illustration for easilyunderstanding the operation of the information processing apparatus 2000and the operation of the information processing apparatus 2000 is notlimited by FIG. 1 .

The information processing apparatus 2000 analyzes a captured image 12generated by a camera 10 to determine a motion of a person. The camera10 is a camera that images a display place where an item is displayed.The camera 10 repeatedly performs imaging and generates a plurality ofcaptured images 12. The plurality of generated captured images 12 are,for example, a frame group that constitutes video data. However, theplurality of captured images 12 generated by the camera 10 do notnecessarily need to constitute the video data and may be handled asindividual still image data.

An item to be imaged by the camera 10 can be any item that is displayedat the display place, and is taken out from the display place by aperson or is placed (returned) in the display place by a person on thecontrary. A specific item to be imaged by the camera 10 varies dependingon a usage environment of the information processing apparatus 2000.

For example, it is assumed that the information processing apparatus2000 is used to determine the motion of a customer or a store clerk in astore. In this case, the item to be imaged by the camera 10 is a productsold in the store. Further, the display place described above is, forexample, a product shelf. In FIG. 1 , the information processingapparatus 2000 is used to determine the motion of a customer 20.Therefore, the person and the item to be imaged by the camera 10 arerespectively the customer 20 and a product 40. Further, the displayplace is a product shelf 50.

In addition, for example, it is assumed that the information processingapparatus 2000 is used to determine the motion of a factory worker orthe like. In this case, the person to be imaged by the camera 10 is theworker or the like. Further, the item to be imaged by the camera 10 is amaterial, a tool, or the like which is used in the factory. Furthermore,the display place is a shelf installed in, for example, a warehouse ofthe factory.

For ease of explanation, a case where the information processingapparatus 2000 is used to determine the motion of the customer (customer20 in FIG. 1 ) in the store will be described as an example, unlessotherwise noted in this specification. Therefore, it is assumed that the“motion of person” determined by the determination unit 2060 is the“motion of customer”. Further, it is assumed that the “item” to beimaged by the camera is the “product”. Furthermore, it is assumed thatthe “display place” is the “product shelf”.

The information processing apparatus 2000 detects a reference position24 from the captured image 12. The reference position 24 indicates aposition of a hand of the person. The position of the hand of the personis, for example, a center position of the hand or a position of afingertip. The information processing apparatus 2000 decides a region tobe analyzed (analysis target region 30) in the captured image 12 usingthis reference position 24. The information processing apparatus 2000analyzes the analysis target region 30 to determine the motion of thecustomer 20. For example, the motion of the customer 20 is a motion ofholding the product 40, a motion of taking out the product 40 from theproduct shelf 50, or a motion of placing the product 40 on the productshelf 50.

Advantageous Effect

In a case where it is intended that image analysis is performed on theentire captured image 12 to determine the motion of the customer 20, themotion may not be accurately determined when a size of the product 40 issmall or when a pose of the hand of the customer 20 variessignificantly. In this regard, the information processing apparatus 2000first detects the reference position 24 indicating the position of thehand of the customer 20 in the captured image 12 and decides theanalysis target region 30 based on the reference position 24. That is,the image analysis is performed near the hand of the customer 20.Therefore, even when the size of the product 40 is small or when thepose of the hand of the customer 20 varies significantly, it is possibleto determine the motion by the hand of the customer 20 such as acquiringthe product 40, placing the product 40, or holding the product 40 withhigh accuracy.

Hereinafter, the information processing apparatus 2000 according to thepresent example embodiment will be described in more detail.

Example of Functional Configuration of Information Processing Apparatus2000

FIG. 2 is a block diagram illustrating an example of a functionalconfiguration of the information processing apparatus 2000 according tothe example embodiment 1. The information processing apparatus 2000 hasa detection unit 2020, a deciding unit 2040, and a determination unit2060. The detection unit 2020 detects the reference position 24 of thehand of the person included in the captured image 12 from the capturedimage 12. The deciding unit 2040 decides the analysis target region 30in the captured image 12 using the detected reference position 24. Thedetermination unit 2060 analyzes the decided analysis target region 30to determine the motion of the person.

Example of Hardware Configuration of Information Processing Apparatus2000

Each functional configuration unit of the information processingapparatus 2000 may be formed by hardware (for example, a hard-wiredelectronic circuit) that forms each functional configuration unit or acombination of hardware and software (for example, a combination of anelectronic circuit and a program that controls the circuit).Hereinafter, the case where each functional configuration unit of theinformation processing apparatus 2000 is formed by the combination ofhardware and software will be further described.

FIG. 3 is a diagram illustrating a computer 1000 for forming theinformation processing apparatus 2000. The computer 1000 is a variety ofcomputers. For example, the computer 1000 is a personal computer (PC), aserver machine, a tablet terminal, or a smartphone. In addition, forexample, the computer 1000 may be the camera 10. The computer 1000 maybe a dedicated computer designed to form the information processingapparatus 2000 or may be a general-purpose computer.

The computer 1000 has a bus 1020, a processor 1040, a memory 1060, astorage device 1080, an input and output interface 1100, and a networkinterface 1120. The bus 1020 is a data transmission path for theprocessor 1040, the memory 1060, the storage device 1080, the input andoutput interface 1100, and the network interface 1120 to mutuallytransmit and receive data. However, a method of mutually connecting theprocessors 1040 and the like is not limited to the bus connection. Theprocessor 1040 is an arithmetic apparatus such as a central processingunit (CPU) or a graphics processing unit (GPU). The memory 1060 is amain storage apparatus formed by a random access memory (RAM) or thelike. The storage device 1080 is an auxiliary storage apparatus formedby a hard disk, a solid state drive (SSD), a memory card, a read onlymemory (ROM), or the like. However, the storage device 1080 may beconfigured by hardware similar to the hardware used to configure themain storage apparatus, such as the RAM.

The input and output interface 1100 is an interface for connecting thecomputer 1000 to an input and output device. The network interface 1120is an interface for connecting the computer 1000 to a communicationnetwork. This communication network is, for example, a local areanetwork (LAN) or a wide area network (WAN). The method of connecting thenetwork interface 1120 to the communication network may be a wirelessconnection or a wired connection.

For example, the computer 1000 is connected to the camera 10 in acommunicable manner through a network. However, the method of connectingthe computer 1000 to the camera 10 in a communicable manner is notlimited to connection through the network. However, the computer 1000does not necessarily need to be connected to the camera 10 in acommunicable manner as long as the captured image 12 generated by thecamera 10 is acquired.

The storage device 1080 stores program modules to form the respectivefunctional configuration units (the detection unit 2020, the decidingunit 2040, and the determination unit 2060) of the informationprocessing apparatus 2000. The processor 1040 reads each of the programmodules into the memory 1060 and executes each program module to realizea function corresponding to each program module.

About Camera 10

The camera 10 is any camera that can repeatedly perform the imaging andgenerate the plurality of captured images 12. The camera 10 may be avideo camera that generates the video data or a still camera thatgenerates still image data. Note that the captured image 12 is a videoframe constituting the video data in the former case.

The camera 10 may be a two-dimensional camera or a three-dimensionalcamera (stereo camera or depth camera). Note that the captured image 12may be a depth image in a case where the camera 10 is the depth camera.The depth image is an image in which a value of each pixel of the imagerepresents a distance between an imaged item and the camera.Furthermore, the camera 10 may be an infrared camera.

As described above, the computer 1000 that forms the informationprocessing apparatus 2000 may be the camera 10. In this case, the camera10 analyzes the captured image 12 generated by itself to determine themotion of the customer 20. As the camera 10 having such a function, forexample, an intelligent camera, a network camera, or a camera called anInternet protocol (IP) camera can be used.

Flow of Processing

FIG. 4 is a flowchart illustrating a flow of processing executed by theinformation processing apparatus 2000 according to the exampleembodiment 1. The detection unit 2020 acquires the captured image 12(S102). The detection unit 2020 detects the reference position 24 of thehand of the product 40 from the acquired captured image 12 (S104). Thedeciding unit 2040 decides the analysis target region 30 using thedetected reference position 24 (S106). The determination unit 2060performs the image analysis on the decided analysis target region 30(S108). The determination unit 2060 determines the motion of thecustomer 20 based on the result of the image analysis of the analysistarget region 30 (S108).

Here, the plurality of captured images 12 may be used to determine themotion of the customer 20. In this case, the image analysis is performedon the analysis target regions 30 decided for each of the plurality ofcaptured images 12 (image analysis is performed on a plurality ofanalysis target regions 30) to determine the motion of the customer 20.That is, the processing of S102 to S108 is performed for each of theplurality of captured images 12, and the processing of S110 is performedusing the result.

Timing when Information Processing Apparatus 2000 Executes Processing

There are various timings when the information processing apparatus 2000executes a series of pieces of processing shown in FIG. 4 . For example,each time the captured image 12 is generated by the camera 10, theinformation processing apparatus 2000 executes the series of pieces ofprocessing shown in FIG. 4 for the captured image 12.

In addition, for example, the information processing apparatus 2000executes the series of pieces of processing shown in FIG. 4 at apredetermined time interval (for example, every second). In this case,for example, the information processing apparatus 2000 acquires thelatest captured image 12 generated by the camera 10 at the timing ofstarting the series of pieces of processing shown in FIG. 4 .

Acquisition of Captured Image 12: S102

The detection unit 2020 acquires the captured image 12 (S102). Anymethod of the detection unit 2020 to acquire the captured image 12 maybe employed. For example, the detection unit 2020 receives the capturedimage 12 transmitted from the camera 10. Further, for example, thedetection unit 2020 accesses the camera 10 and acquires the capturedimage 12 stored in the camera 10.

Note that the camera 10 may store the captured image 12 in a storageapparatus provided outside the camera 10. In this case, the detectionunit 2020 accesses the storage apparatus and acquires the captured image12.

In a case where the information processing apparatus 2000 is formed bythe camera 10, the information processing apparatus 2000 acquires thecaptured image 12 generated by the information processing apparatus 2000itself. In this case, the captured image 12 is stored in, for example,the memory 1060 or the storage device 1080 (refer to FIG. 3 ) inside theinformation processing apparatus 2000. Therefore, the detection unit2020 acquires the captured image 12 from the memory 1060 or the storagedevice 1080.

The captured image 12 (that is, an imaging range of the camera 10)includes at least a range in front of the product shelf 50. FIG. 5 is afirst diagram illustrating the imaging range of the camera 10. In FIG. 5, an imaging range 14 of the camera 10 includes a range of a distance d1from the front surface of the product shelf 50 to the front side.

Note that the imaging range of the camera 10 may not include the productshelf 50. FIG. 6 is a second diagram illustrating the imaging range ofthe camera 10. In FIG. 6 , the imaging range 14 of the camera 10includes a range from a position apart from the front surface of theproduct 40 to the front side by d2 to a position apart from the frontside of the product 40 to the front side by d3.

Further, the captured images 12 in FIGS. 5 and 6 include scenes in whichthe product shelf 50 is viewed down from above. In other words, thecamera 10 is installed so as to image the product shelf 50 from abovethe product shelf 50. However, the captured image 12 may not include thescene in which the product shelf 50 is viewed down from above. Forexample, the captured image 12 may include a scene in which the productshelf 50 is imaged from the side. FIG. 7 is a diagram illustrating acase where the captured image 12 includes a scene in which the productshelf 50 is imaged from the right side as viewed from the front.

Detection of Reference Position 24: S104

The detection unit 2020 detects the reference position 24 from thecaptured image 12 (S104). As described above, the reference position 24indicates the position of the hand of the customer 20. As describedabove, the position of the hand of the customer 20 is, for example, thecenter position of the hand or the position of the fingertip. There arevarious methods for the detection unit 2020 to detect the referenceposition 24 from the captured image 12. For example, the detection unit2020 performs feature value matching using a feature value of the handof the person, which is prepared in advance, to detect a region matchingthe feature value (with high similarity to the feature value) from thecaptured image 12. The detection unit 2020 detects a predeterminedposition (for example, center position) of the detected region, that is,a region representing the hand as the reference position 24 of the hand.

In addition, for example, the detection unit 2020 may detect thereference position 24 using machine learning. Specifically, thedetection unit 2020 is configured as a detector using the machinelearning. In this case, the detection unit 2020 is caused to learn inadvance using one or more captured images (a set of a captured image andcoordinates of the reference position 24 in the captured image) in whichthe reference positions 24 are known. With this, the detection unit 2020can detect the reference position 24 from the acquired captured image12. Note that various models such as a neural network can be used as amachine learning prediction model.

Here, the learning of the detection unit 2020 is preferably performed onthe hand of the customer 20 in various poses. Specifically, capturedimages for learning are prepared for the hand of customers 20 in variousposes. With this, it is possible to detect the reference position 24from each captured image 12 with high accuracy even though the pose ofthe hand of the customer 20 is different for each captured image 12.

Here, the detection unit 2020 may detect various parameters relating tothe hand of the customer 20 in addition to the reference position 24.For example, the detection unit 2020 detects a width, length, and poseof the hand, and a distance between the reference position 24 and thecamera 10. In a case where the feature value matching is used, thedetection unit 2020 determines the width, length, pose, and the like ofthe hand from a shape and size of a detected hand region. In a casewhere the machine learning is used, the detection unit 2020 is caused tolearn using one or more captured images in which the width, length, andpose of the hand, the distance between the reference position 24 and thecamera 10, and the like are known. With this, it is possible for thedetection unit 2020 to detect various parameters such as the hand widthin addition to the reference position 24 from the acquired capturedimage 12.

Decision of Analysis Target Region 30: S106

The deciding unit 2040 decides the analysis target region 30 using thedetected reference position 24 (S106). There are various methods for thedeciding unit 2040 to decide the analysis target region 30. For example,the deciding unit 2040 is a region having a predetermined shape definedwith the reference position 24 as a reference among the regions includedin the captured image 12.

FIG. 8 are diagrams illustrating the analysis target region 30 that isdecided as the region having the predetermined shape defined with thereference position 24 as the reference. FIG. 8A represents a case wherethe reference position 24 is used as a position representing apredetermined position of the analysis target region 30. Specifically,the analysis target region 30 in FIG. 8A is a rectangle with thereference position 24 as the center. The analysis target region 30 is arectangle having a height h and a width w. Note that the referenceposition 24 may be used as a position that defines a position other thanthe center of the analysis target region 30 such as the upper left endor lower right end of the analysis target region 30.

FIG. 8B represents a case where a predetermined position (center, upperleft corner, or the like) of the analysis target region 30 is defined bya position having a predetermined relationship with the referenceposition 24. Specifically, the analysis target region 30 in FIG. 8B is arectangle with a position moved by a predetermined vector v from thereference position 24 as the center. The size and orientation of therectangle are the same as the analysis target region 30 in FIG. 8A.

In the example of FIGS. 8 , the orientation of the analysis targetregion 30 is defined based on an axial direction of the captured image12. More specifically, a height direction of the analysis target region30 is defined as a Y-axis direction of the captured image 12. However,the orientation of the analysis target region 30 may be defined based ona direction other than the axial direction of the captured image 12.

For example, it is assumed that the detection unit 2020 detects the poseof the hand of the customer 20. In this case, the orientation of theanalysis target region 30 may be defined based on the orientation of thehand. FIG. 9 is a diagram illustrating a case where the orientation ofthe analysis target region 30 is defined based on the orientation of thehand of the customer 20. In FIG. 9 , the orientation of the analysistarget region 30 is defined as a depth direction of the hand (directionfrom the wrist to the fingertip).

Note that the orientation of the analysis target region 30 in each ofthe plurality of captured images 12 may be different in a case where theorientation of the analysis target region 30 is defined based on theorientation of the hand of the customer 20 as described above.Therefore, it is preferable that the deciding unit 2040 performsgeometric conversion such that the orientations of the plurality ofanalysis target regions 30 are aligned. For example, the deciding unit2040 extracts the analysis target region 30 from each captured image 12and performs the geometric conversion on each extracted analysis targetregion 30 such that the depth direction of the hand of the customer 20faces the Y-axis direction.

The size of the analysis target region 30 may be defined statically ormay be decided dynamically. In the latter case, the size of the analysistarget region 30 is decided by, for example, the following equation (1).

$\begin{matrix}\left\lbrack {{Equation}1} \right\rbrack &  \\{{h = {h_{b} \times \frac{s_{r}}{s_{b}}}}{w = {w_{b} \times \frac{s_{r}}{s_{b}}}}} & (1)\end{matrix}$

The h and w are respectively the height and width of the analysis targetregion 30. The s_(b) is a reference area defined in advance for the handregion. The h_(b) and w_(b) are respectively the height and width of theanalysis target region 30 defined in advance in association with thereference area. The s_(r) is an area of the hand region detected fromthe captured image 12 by the detection unit 2020.

In addition, for example, the size of the analysis target region 30 maybe dynamically decided using the following equation (2).

$\begin{matrix}\left\lbrack {{Equation}2} \right\rbrack &  \\{{h = {h_{b} \times \frac{d_{b}}{d_{r}}}}{w = {w_{b} \times \frac{d_{b}}{d_{r}}}}} & (2)\end{matrix}$

The h and w are respectively the height and width of the analysis targetregion 30. The d_(b) is a reference distance value defined in advance.The h_(b) and w_(b) are respectively the height and width of theanalysis target region 30 associated with the reference distance value.The d_(r) is a distance value between the reference position 24 detectedfrom the captured image 12 and the camera 10.

There are various methods of determining the distance value dr. Forexample, the detection unit 2020 determines the distance value d_(r)based on a pixel value at the reference position 24 in the depth imagegenerated by the depth camera. In addition, for example, the detectionunit 2020 may be configured to detect the distance between the referenceposition 24 and the camera 10 in addition to the reference position 24when the detection unit 2020 is configured as the detector using themachine learning.

Here, each pixel of the analysis target region 30 decided by the abovemethod may be corrected, and the corrected analysis target region 30 maybe used for the image analysis by the determination unit 2060. Thedeciding unit 2040 corrects each pixel in the analysis target region 30using, for example, the following equation (3).

[Equation 3]

d _((x,y)1) =d _((x,y)0)+(d _(r) −d _(b))  (3)

The d_((x,y)0) is a pixel value before the correction at coordinates(x,y) of the analysis target region 30 in the captured image 12. Thed_((x,y)1) is a pixel value after the correction at the coordinates(x,y) of the analysis target region 30 in the captured image 12.

Determination of Motion of Customer 20: S108, S110

The determination unit 2060 performs the image analysis on the decidedanalysis target region 30 to determine the motion of the customer 20(S108 and S110). The motion of the customer 20 is, for example, any of(1) motion of taking out the product 40 from the product shelf 50, (2)motion of placing the product 40 on the product shelf 50, (3) motion ofnot holding the product 40 both before and after the contact with theproduct shelf 50, and (4) motion of holding the product 40 both beforeand after the contact with the product shelf 50.

Here, “the contact between the product shelf 50 and the customer 20”means that the image region of the product shelf 50 and the image regionof the customer 20 overlap at least partially in the captured image 12,and there is no need for the product shelf 50 and the customer 20 tocontact each other in the real space. Further, in (4) described above, aproduct 40 held by the customer 20 before the contact between thecustomer 20 and the product shelf 50 may be the same as or differentfrom a product 40 held by the customer 20 after the contact between thecustomer 20 and the product shelf 50.

A flow of processing of discriminating the four motions described aboveis, for example, a flow shown in FIG. 10 . FIGS. 10 and 11 areflowcharts illustrating the flow of the processing for determining themotion of the customer 20. First, the determination unit 2060 detectsthe captured image 12 including a scene in which the reference position24 moves toward the product shelf 50 (S202). For example, thedetermination unit 2060 computes a distance between the referenceposition 24 and the product shelf 50 for each of the plurality ofcaptured images 12 in a time series. In a case where the distancedecreases over time in one or more captured images 12, the capturedimages 12 are detected as the captured image 12 including the scene inwhich the reference position 24 moves toward the product shelf 50.

Furthermore, the determination unit 2060 decides whether the product 40is included in the analysis target region 30 in the captured image 12detected in S202 (S204). In a case where the product 40 is included inthe analysis target region 30 (YES in S204), the processing in FIG. 10proceeds to S206. On the other hand, in a case where the product 40 isnot included in the analysis target region 30 (NO in S204), theprocessing in FIG. 10 proceeds to S216.

In S206, the determination unit 2060 detects a captured image 12including a scene in which the reference position 24 moves in adirection away from the product shelf 50 from among the captured images12 generated later than the captured image 12 detected in S202 (S206).For example, the determination unit 2060 computes the distance betweenthe reference position 24 and the product shelf 50 for each of theplurality of captured images 12 in a time series generated later thanthe captured image 12 detected in S202. In a case where the distanceincreases over time in one or more captured images 12, the capturedimages 12 are detected as the captured image 12 including the scene inwhich the reference position 24 moves in the direction away from theproduct shelf 50.

Furthermore, the determination unit 2060 decides whether the product 40is included in the analysis target region 30 in the captured image 12detected in S206 (S208). In a case where the product 40 is included inthe analysis target region 30 (YES in S208), the product 40 is held inboth a hand moving toward the product shelf 50 and a hand moving in thedirection away from the product shelf 50. Therefore, the determinationunit 2060 determines that the motion of the customer 20 is “(4) motionof holding the product 40 both before and after the contact with theproduct shelf 50” (S210).

On the other hand, in a case where the product 40 is not included in theanalysis target region 30 (No in S208), the product 40 is not held inthe hand moving in the direction away from the product shelf 50 whilethe product 40 is held in the hand moving toward the product shelf 50.Therefore, the determination unit 2060 determines that the motion of thecustomer 20 is “(2) motion of placing the product 40 on the productshelf 50” (S212).

In S214, the determination unit 2060 detects the captured image 12including the scene in which the reference position 24 moves in thedirection away from the product shelf 50 from among the captured images12 generated later than the captured image 12 detected in S202. Thedetection method is the same as the method executed in S206.

Furthermore, the determination unit 2060 decides whether the product 40is included in the analysis target region 30 in the captured image 12detected in S214 (S216). In a case where the product 40 is included inthe analysis target region 30 (YES in S216), the product 40 is held inthe hand moving in the direction away from the product shelf 50 whilethe product 40 is not held in the hand moving toward the product shelf50. Therefore, the determination unit 2060 determines that the motion ofthe customer 20 is “(1) motion of taking out the product 40 from theproduct shelf 50” (S218).

On the other hand, in a case where the product 40 is not included in theanalysis target region 30 (NO in S216), the product 40 is not held inboth the hand moving toward the product shelf 50 and the hand moving inthe direction away from the product shelf 50. Therefore, thedetermination unit 2060 determines that the motion of the customer 20 is“(3) motion of not holding the product 40 both before and after contactwith the product shelf 50” (S220).

Here, for example, there is the following method as the method ofdetecting whether the product 40 is included in the analysis targetregion 30. The determination unit 2060 first extracts an image regionexcluding a background region, that is, a foreground region, from theanalysis target region 30 decided for each of the plurality of capturedimages 12 in a time series. Note that an existing technique can be usedas a technique of determining the background region for a captured imageto be imaged by the camera 10 installed at a predetermined place.

The determination unit 2060 decides that the product 40 is included inthe analysis target region 30 in a case where the foreground regionincludes a region other than the image region representing the hand ofthe customer 20. However, the determination unit 2060 may decide thatthe product 40 is included in the analysis target region 30 only in acase where a size of the image region excluding the image regionrepresenting the hand in the foreground region is equal to or largerthan a predetermined size. With this, it is possible to prevent thenoise included in the captured image 12 from being erroneously detectedas the product 40.

The method of deciding whether the product 40 is included in theanalysis target region 30 is not limited to the method described above.Various existing methods can be used as the method of deciding whetherthe product 40 is included in the analysis target region 30, that is,whether the hand of the person included in the image has the product.

Note that the determination unit 2060 may determine the motion of thecustomer 20 from one captured image 12. For example, in this case, thedetermination unit 2060 determines the motion of the customer 20 as“holding the product 40” or “not holding the product 40”.

Determination of Product 40

The determination unit 2060 may determine the taken-out product 40 whenthe customer 20 takes out the product 40 from the product shelf 50. Thedetermination of the product 40 means, for example, that information foridentifying the product 40 from other products 40 (for example, anidentifier or a name of the product 40) is determined. Hereinafter, theinformation for identifying the product 40 is referred to as productidentification information.

The determination unit 2060 determines a place in the product shelf 50where the customer 20 takes out the product 40 to determine thetaken-out product 40. As a premise, it is assumed that the display placeof the product 40 is defined in advance. Here, information indicatingwhich product is displayed at each position of the product shelf 50 isreferred to as display information. The determination unit 2060determines a place in the product shelf 50 from which the customer 20takes out a product 40 using the captured image 12 and determines thetaken-out product 40 using the determined place and the displayinformation.

For example, it is assumed that a determined product 40 is displayed foreach stage in the product shelf 50. In this case, the displayinformation indicates the product identification information inassociation with the stage of the product shelf 50. FIG. 12 is a diagramillustrating the display information in a table format. A table shown inFIG. 12 is referred to as a table 200. The table 200 is created for eachproduct shelf 50. The table 200 has two columns of a stage 202 andproduct identification information 204. In FIG. 12 , the productidentification information 204 indicates the identifier of the product40. For example, in the table 200 representing the display informationof the product shelf 50 determined by an identifier s0001, a record in afirst row displays a product 40 determined by an identifier i0001 in afirst stage of the product shelf 50.

The determination unit 2060 determines the stage of the product shelf 50from which the product 40 is taken out, using the captured image 12. Thedetermination unit 2060 acquires the product identification informationassociated with the stage in display information to determine theproduct 40 taken out from the product shelf 50. Hereinafter, severalmethods of determining the stage of the product shelf 50 from which theproduct 40 is taken out will be illustrated.

First Method

As a premise, it is assumed that the captured image 12 includes a scenein which the product shelf 50 is imaged from above (refer to FIG. 5 ).In other words, it is assumed that the camera 10 images the productshelf 50 from above. In this case, the depth camera is used as thecamera 10. The depth camera generates a depth image in addition to orinstead of a common captured image. As described above, the depth imageis an image in which the value of each pixel of the image represents thedistance between the imaged item and the camera. FIG. 13 is a diagramillustrating a depth image generated by the camera 10. In the depthimage in FIG. 13 , pixels representing an item closer to the camera 10are closer to white (brighter) and pixels representing an item fartherfrom the camera 10 are closer to black (darker). Note that darkerportions are densely drawn with larger black dots and brighter portionsare sparsely drawn with smaller black dots in FIG. 13 for convenience ofillustration.

The determination unit 2060 determines a stage of the product shelf 50where the reference position 24 is present, based on the value of thepixel representing the reference position 24 in the depth image. At thistime, a range of a distance from the camera 10 for each stage of theproduct shelf 50 is defined in advance in the display information. FIG.14 is a diagram illustrating display information indicating the range ofthe distance from the camera 10 for each stage of the product shelf 50.For example, the table 200 in FIG. 14 indicates that the range of thedistance between a first shelf of the product shelf 50 and the camera 10is equal to or larger than d1 and less than d2. In other words, thedistance between the top of the first shelf and the camera 10 is d1, andthe distance between the top of a second shelf and the camera 10 is d2.

The determination unit 2060 determines a stage of the product shelf 50where the reference position 24 is present, based on the referenceposition 24 of the depth image including a scene in which the customer20 takes out the product 40 and the display information shown in FIG. 14. The determined stage is defined as the stage from which the product 40is taken out. For example, it is assumed that the pixel at the referenceposition 24 in the depth image indicates that the distance between thereference position 24 and the camera 10 is a. It is assumed that a isequal to or larger than d1 and equal to or less than d2. In this case,the determination unit 2060 determines that the reference position 24 ispresent on the first shelf of the product shelf 50 based on the displayinformation shown in FIG. 14 . That is, the determination unit 2060determines that the shelf from which the product 40 is taken out is thefirst shelf of the product shelf 50.

Second Method

As a premise, it is assumed that the captured image 12 includes a scenein which the product shelf 50 is viewed from the side. In other words,it is assumed that the camera 10 images the product shelf 50 from thelateral direction. In this case, the determination unit 2060 determinesa stage in the product shelf 50 where a position of the referenceposition 24 in the height direction (Y coordinates) detected from thecaptured image 12 is present. The determined stage is defined as thestage of the product shelf 50 from which the product 40 is taken out. Inthis case, the captured image 12 may be a depth image or a common image.

About Case where Plurality of Types of Products 40 are Displayed in OneStage

A plurality of types of products may be displayed on one stage bydividing one stage of the product shelf 50 into a plurality of columnsin the horizontal direction. In this case, the determination unit 2060respectively determines a position in the horizontal direction and aposition in the height direction for the reference position 24 of thehand of the customer 20 who takes out the product 40 from the productshelf 50 to determine the product 40. In this case, the productidentification information is shown for each combination of stage andcolumn in the display information. Hereinafter, a method of determiningthe position of the reference position 24 in the horizontal directionwill be described.

It is assumed that the camera 10 images the product shelf 50 from above.In this case, the position of the reference position 24 in thehorizontal direction is determined by the X coordinates of the referenceposition 24 in the captured image 12.

On the other hand, it is assumed that the camera 10 images the productshelf 50 from the lateral direction. In this case, the determinationunit 2060 determines the position of the reference position 24 in thehorizontal direction using the depth image. Here, the method ofdetermining the position of the reference position 24 in the horizontaldirection, using the depth image including a scene in which the productshelf 50 is imaged from the lateral direction, is the same as the methodof determining the position of the reference position 24 in the heightdirection, using the depth image including the scene in which theproduct shelf 50 is imaged from above.

Note that the method of determining the product 40 to be taken out fromthe product shelf 50 is described, but the determination unit 2060 maydetermine the product 40 to be placed on the product shelf 50 by thesimilar method. However, in this case, the determination unit 2060 usesa captured image 12 including a scene in which the product 40 is placedon the product shelf 50.

Here, it is assumed that “(4) motion of holding the product 40 bothbefore and after the contact with the product shelf 50” is determined asthe motion of the customer 20. In this case, the determination unit 2060may decide whether the products 40 held by the customer 20 before andafter the contact between the customer 20 and the product shelf 50 arethe same based on the method of determining the product 40 describedabove. For example, the determination unit 2060 determines the product40 before the contact between the customer 20 and the product shelf 50by the same method as the method of determining the product 40 to beplaced on the product shelf 50. Furthermore, the determination unit 2060determines the product 40 after the contact between the customer 20 andthe product shelf 50 by the same method as the method of determining theproduct 40 to be taken out from the product shelf 50. In a case wherethe two determined products 40 are the same, the determination unit 2060decides that the products 40 held by the customer 20 before and afterthe contact between the customer 20 and the product shelf 50 are thesame. In this case, it can be said that the motion of the customer 20 isa “motion of reaching for the product shelf 50 to place the product 40,but not placing the product 40”. On the other hand, in a case where thetwo determined products 40 are different from each other, thedetermination unit 2060 decides that the products 40 held by thecustomer 20 before and after the contact between the customer 20 and theproduct shelf 50 are different from each other. In this case, it can besaid that the motion of the customer 20 is a “motion of placing the heldproduct 40 and taking out another product 40”.

However, the above determination may be performed without specificallydetermining the product 40. For example, the determination unit 2060computes magnitude of a difference (difference in area or color) betweenthe foreground region of the analysis target region 30 before thecontact between the customer 20 and the product shelf 50 and theforeground region of the analysis target region 30 after the contactbetween the customer 20 and the product shelf 50 and decides that theproducts 40 before and after the contact are different from each otherin a case where the magnitude of the computed difference is equal to orlarger than a predetermined value. On the other hand, the determinationunit 2060 decides that the products 40 before and after the contact arethe same in a case where the magnitude of the difference is less thanthe predetermined value.

In addition, for example, the determination unit 2060 decides whetherthe products 40 before and after the contact are the same based on thedifference in the reference positions 24 before and after the contactbetween the customer 20 and the product shelf 50. In this case, thedetermination unit 2060 respectively determines, using the displayinformation described above, a stage of the product shelf 50 where thereference position 24 is present before the contact between the customer20 and the product shelf 50 and a stage of the product shelf 50 wherethe reference position 24 is present after the contact between thecustomer 20 and the product shelf 50. In a case where the stages of theproduct shelf 50 determined respectively before and after the contactbetween the customer 20 and the product shelf 50 are different from eachother, the determination unit 2060 decides that the products 40 beforeand after the contact are different from each other. On the other hand,in a case where the stages of the product shelf 50 determinedrespectively before and after the contact are the same, thedetermination unit 2060 decides that the products 40 before and afterthe contact are the same.

Utilization Method of Motion of Customer 20 Determined by DeterminationUnit 2060

The motion of the customer 20 determined by the determination unit 2060can be used to analyze an action performed in front of the product shelf50 (so-called front-shelf action) by the customer 20. For this reason,the determination unit 2060 outputs various pieces of information suchas a motion performed in front of the product shelf 50 by each customer20, a date and time when the motion is performed, and a product 40subjected to the motion. This information is, for example, stored in astorage apparatus connected to the information processing apparatus 2000or transmitted to a server apparatus connected to the informationprocessing apparatus 2000 in a communicable manner. Here, variousexisting methods can be used as the method of analyzing the front-shelfaction based on various motions of the customer 20 performed in front ofthe product shelf 50.

Note that a usage scene of the information processing apparatus 2000 isnot limited to the determination of the motion of the customer in thestore. For example, as described above, the information processingapparatus 2000 can be used to determine the motion of a factory workeror the like. In this case, for example, the motion of each workerdetermined by the information processing apparatus 2000 is compared witha motion of each worker defined in advance, and thus it is possible toconfirm whether the worker correctly performs a predetermined job.

The example embodiments of the present invention are described withreference to the drawings. However, the example embodiments are onlyexamples of the present invention, and various configurations other thanthe above can be employed.

1. An information processing apparatus comprising: at least one memoryconfigured to store instructions; and at least one processor configuredto execute the instructions to: detect a reference position indicating aposition of a hand of a person included in a captured image from thecaptured image in which a display place of an item is imaged; decide ananalysis target region in the captured image using the detectedreference position and decides the analysis target region; and determinea motion of the person by analyzing the decided analysis target regiondecided for each of a plurality of captured images generated atdifferent timepoints.
 2. The information processing apparatus accordingto claim 1, wherein the plurality of captured images includes a firstcaptured image including a scene in which the reference position movestoward a product shelf, and a second captured image including a scene inwhich the reference position moves in a direction away from the productshelf from among the captured images generated later than generatedtimepoint of the first captured image.
 3. The information processingapparatus according to claim 2, wherein the first captured image isimaged in a case where a distance between the reference position and theproduct shelf decreases over time.
 4. The information processingapparatus according to claim 2, wherein the second captured image isimaged in a case where a distance between the reference position and theproduct shelf increases over time.
 5. The information processingapparatus according to claim 1, wherein the at least one processor isfurther configured to execute the instructions to determine at least oneof a motion of the person taking out an item from the display place, amotion of the person not holding an item both before and after contactwith the display place, a motion of the person placing an item on thedisplay place, and a motion of the person holding an item both beforeand after contact with the display place.
 6. The information processingapparatus according to claim 5, wherein the at least one processor isfurther configured to execute the instructions to determine an itemwhich is a target of the motion of the person based on displayinformation indicating a position of each item in the display place anda position of the reference position in a height direction included inthe analysis target region.
 7. The information processing apparatusaccording to claim 1, wherein the at least one processor is furtherconfigured to execute the instructions to decide a size of the analysistarget region that is decided by a reference area defined in advance forthe hand region, or a reference distance value defined in advance.
 8. Acontrol method executed by a computer, the method comprising: detectinga reference position indicating a position of a hand of a personincluded in a captured image from the captured image in which a displayplace of an item is imaged; deciding an analysis target region in thecaptured image using the detected reference position and deciding theanalysis target region; and determining a motion of the person byanalyzing the decided analysis target region decided for each of aplurality of captured images generated at different timepoints.
 9. Thecontrol method according to claim 8, wherein the plurality of capturedimages includes a first captured image including a scene in which thereference position moves toward a product shelf, and a second capturedimage including a scene in which the reference position moves in adirection away from the product shelf from among the captured imagesgenerated later than generated timepoint of the first captured image.10. The control method according to claim 9, wherein the first capturedimage is imaged in a case where a distance between the referenceposition and the product shelf decreases over time.
 11. The controlmethod according to claim 2, wherein the second captured image is imagedin a case where a distance between the reference position and theproduct shelf increases over time.
 12. The control method according toclaim 8, wherein the method comprises determining at least one of amotion of the person taking out an item from the display place, a motionof the person not holding an item both before and after contact with thedisplay place, a motion of the person placing an item on the displayplace, and a motion of the person holding an item both before and aftercontact with the display place.
 13. The control method according toclaim 12, wherein the method comprises determining an item which is atarget of the motion of the person based on display informationindicating a position of each item in the display place and a positionof the reference position in a height direction included in the analysistarget region.
 14. The control method according to claim 8, wherein themethod comprises deciding a size of the analysis target region that isdecided by a reference area defined in advance for the hand region, or areference distance value defined in advance.
 15. A non-transitorycomputer-readable medium storing a program for causing a computer toperform operations, the operations comprising: detecting a referenceposition indicating a position of a hand of a person included in acaptured image from the captured image in which a display place of anitem is imaged; deciding an analysis target region in the captured imageusing the detected reference position and deciding the analysis targetregion; and determining a motion of the person by analyzing the decidedanalysis target region decided for each of a plurality of capturedimages generated at different timepoints.
 16. The non-transitorycomputer-readable medium according to claim 15, wherein the plurality ofcaptured images includes a first captured image including a scene inwhich the reference position moves toward a product shelf, and a secondcaptured image including a scene in which the reference position movesin a direction away from the product shelf from among the capturedimages generated later than generated timepoint of the first capturedimage.
 17. The non-transitory computer-readable medium according toclaim 16, wherein the first captured image is imaged in a case where adistance between the reference position and the product shelf decreasesover time.
 18. The non-transitory computer-readable medium according toclaim 16, wherein the second captured image is imaged in a case where adistance between the reference position and the product shelf increasesover time.
 19. The non-transitory computer-readable medium according toclaim 15, wherein the operations comprise determining at least one of amotion of the person taking out an item from the display place, a motionof the person not holding an item both before and after contact with thedisplay place, a motion of the person placing an item on the displayplace, and a motion of the person holding an item both before and aftercontact with the display place.
 20. The non-transitory computer-readablemedium according to claim 19, wherein the operations comprisedetermining an item which is a target of the motion of the person basedon display information indicating a position of each item in the displayplace and a position of the reference position in a height directionincluded in the analysis target region.